最近的工作表明了计算机视觉应用的变压器的潜力。第一图像首先分区,然后将其用作注意机制的输入令牌。由于注意机构的昂贵二次成本,使用大的贴片尺寸,导致粗糙的全局相互作用,或者,替代地,仅在图像的局部区域上施加注意力,以牺牲远程相互作用为代价。在这项工作中,我们提出了一种方法,该方法允许在视觉变压器的早期层上允许粗糙的全局相互作用和细粒局部相互作用。在我们的方法的核心,是应用本地和全球注意层的应用。在本地注意层中,我们对每个补丁及其本地移位进行注意,导致几乎位于本地补丁,这些修补程序不绑定到单个特定位置。然后在全球注意层中使用这些实际的补丁。注意层进入本地和全局对应物的分离允许在贴片的数量中进行低计算成本,同时仍然支持已经在第一层处的数据相关的本地化,而不是其他可视变压器中的静态定位。我们的方法被证明优于基于卷积和变压器的图像分类方法,用于CIFAR10,CIFAR100和Imagenet。代码可在:https://github.com/shellysheynin/locally-sag-transformer。
translated by 谷歌翻译
Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
translated by 谷歌翻译
Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciation data at the phonetic level, which is scarce for IE. Moreover, versatility of IE stems from the influence of a large diversity of the speakers' mother tongues and demographic region differences. Prior linguistic works have characterised features of IE variabilities qualitatively by reporting phonetic rules that represent such variations relative to RP. The qualitative descriptions often lack quantitative descriptors and data-driven analysis of diverse IE pronunciation data to characterise IE on the phonetic level. To address these issues, in this work, we consider a corpus, Indic TIMIT, containing a large set of IE varieties from 80 speakers from various regions of India. We present an analysis to obtain the new set of phonetic rules representing IE pronunciation variabilities relative to RP in a data-driven manner. We do this using 15,974 phonetic transcriptions, of which 13,632 were obtained manually in addition to those part of the corpus. Furthermore, we validate the rules obtained from the analysis against the existing phonetic rules to identify the relevance of the obtained phonetic rules and test the efficacy of Grapheme-to-Phoneme (G2P) conversion developed based on the obtained rules considering Phoneme Error Rate (PER) as the metric for performance.
translated by 谷歌翻译
基于光学传感器的运动跟踪系统通常遭受问题,例如差的照明条件,遮挡,有限的覆盖,并且可以提高隐私问题。最近,已经出现了使用商业WiFi设备的基于射频(RF)的方法,这些方法提供了低成本的普遍感感知,同时保留隐私。然而,RF感测系统的输出,例如范围多普勒谱图,不能直观地代表人类运动,并且通常需要进一步处理。在本研究中,提出了基于WiFi微多普勒签名的人类骨骼运动重建的新颖框架。它提供了一种有效的解决方案,通过重建具有17个关键点的骨架模型来跟踪人类活动,这可以帮助以更易于理解的方式解释传统的RF感测输出。具体地,MDPose具有各种增量阶段来逐渐地解决一系列挑战:首先,实现去噪算法以去除可能影响特征提取的任何不需要的噪声,并增强弱多普勒签名。其次,应用卷积神经网络(CNN)-Recurrent神经网络(RNN)架构用于从清洁微多普勒签名和恢复关键点的速度信息学习时间空间依赖性。最后,采用姿势优化机制来估计骨架的初始状态并限制误差的增加。我们在各种环境中使用了许多受试者进行了全面的测试,其中许多受试者具有单个接收器雷达系统,以展示MDPOST的性能,并在所有关键点位置报告29.4mm的绝对误差,这优于最先进的RF-基于姿势估计系统。
translated by 谷歌翻译
最近的工作表明,深增强学习(DRL)政策易受对抗扰动的影响。对手可以通过扰乱药剂观察到的环境来误导DRL代理商的政策。现有攻击原则上是可行的,但在实践中面临挑战,例如通过太慢,无法实时欺骗DRL政策。我们表明,使用通用的对冲扰动(UAP)方法来计算扰动,独立于应用它们的各个输入,可以有效地欺骗DRL策略。我们描述了三种这样的攻击变体。通过使用三个Atari 2600游戏的广泛评估,我们表明我们的攻击是有效的,因为它们完全降低了三种不同的DRL代理商的性能(高达100%,即使在扰乱的$ L_ infty $绑定时也很小为0.01)。与不同DRL策略的响应时间(平均0.6ms)相比,它比不同DRL策略的响应时间(0.6ms)更快,并且比使用对抗扰动的前攻击更快(平均1.8ms)。我们还表明,我们的攻击技术是高效的,平均地产生0.027ms的在线计算成本。使用涉及机器人运动的两个进一步任务,我们确认我们的结果概括了更复杂的DRL任务。此外,我们证明了已知防御的有效性降低了普遍扰动。我们提出了一种有效的技术,可检测针对DRL政策的所有已知的对抗性扰动,包括本文呈现的所有普遍扰动。
translated by 谷歌翻译